Mandarin Chinese Broadcast News Retrieval and Summarization Using Probabilistic Generative Models
نویسندگان
چکیده
This paper presents our recent research work on applying probabilistic generative models to Mandarin Chinese broadcast news retrieval and summarization. Most models can be trained in either a supervised or unsupervised manner. In addition, both literal term matching and concept matching strategies have been intensively investigated. This paper also presents a prototype web-based Mandarin Chinese broadcast news retrieval system, which is based on technologies such as automatic story segmentation, automatic speech recognition, spoken document retrieval and summarization.
منابع مشابه
Extractive spoken document summarization for information retrieval
The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In the paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling...
متن کاملA unified probabilistic generative framework for extractive spoken document summarization
In this paper, we consider extractive summarization of Chinese broadcast news speech. A unified probabilistic generative framework that combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of a spoken document to be summarized was treated as a probabilistic generative model for predicting the document. Two different mat...
متن کاملThe SoVideo Mandarin Chinese Broadcast News Retrieval System
This paper describes the SoVideo broadcast news retrieval system for Mandarin Chinese. The system is based on technologies such as large-vocabulary continuous speech recognition for Mandarin Chinese, automatic story segmentation, and information retrieval. Currently, the database consists of 177 hours of broadcast news, which yields 3264 stories by automatic story segmentation. We discuss the d...
متن کاملSpeech retrieval of Mandarin broadcast news via mobile devices
This paper presents a system for speech retrieval of Mandarin broadcast news. First, several data-driven and unsupervised approaches are integrated into the broadcast news transcription system to improve the speech recognition accuracy and efficiency. Then, a multi-scale indexing paradigm for broadcast news retrieval is proposed to make use of the special structural properties of the Chinese la...
متن کاملRetrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics
Spoken document retrieval has been extensively studied in recent years because of its high potential in various applications in the near future. Considering the monosyllabic structure of Chinese language, a whole class of indexing features for retrieval of spoken documents in Mandarin Chinese using syllable-level statistical characteristics has been studied, and very encouraging experimental re...
متن کامل